Fables of reconstruction: controlling bias in the dark energy equation of state 
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We develop an efficient, non-parametric Bayesian method for reconstructing the time evolution 
of the dark energy equation of state w(z) from observational data. Of particular importance is the 
choice of prior, which must be chosen carefully to minimise variance and bias in the reconstruction. 
Using a principal component analysis, we show how a correlated prior can be used to create a smooth 
reconstruction and also avoid bias in the mean behaviour of w(z). We test our method using Wiener 
reconstructions based on Fisher matrix projections, and also against more realistic MCMC analyses 
of simulated data sets for Planck and a future space-based dark energy mission. While the accuracy 
of our reconstruction depends on the smoothness of the assumed w(z), the relative error for typical 
dark energy models is < 10% out to redshift z — 1.5. 
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I. INTRODUCTION 

The nature of dark energy (DE), the source driving 
the acceleration of the universe in the framework of gen- 
eral relativity, has remained a mystery since the accel- 
eration was first discovered [1]. With the accumulating 
observational data, including supernovae (SN), cosmic 
microwave background radiation (CMB) and large scale 
structure (LSS) data, we hope to understand whether 
DE is a cosmological constant or whether it is dynami- 
cal. Dynamical dark energy is characterised by having an 
cquation-of-state w ^ — 1, and which in general could be 
a function of redshift w(z) or, equivalcntly, the scale fac- 
tor w(a). One of the key goals in dark energy observations 
is to constrain the equation of state and its evolution. 

The time evolution of w can, in principle, be re- 
constructed from data using either parametric or non- 
parametric methods (e.g. [2]). Non-parametric methods 
have the advantage that they do not assume an ad hoc 
functional form of w(z), which could lead one to miss ev- 
idence for a different kind of evolution. Unfortunately, 
non-parametric methods generically contain many de- 
grees of freedom which can lead to parameter degenera- 
cies, making them difficult to explore. 

Many efforts have been made in the literature to de- 
velop non-parametric methods, which can vary greatly 
depending on the nature of the data and the choices of 
the functions to be reconstructed. One particular focus 
has been using SN and other standard candle data to re- 
construct various ways of paramterizing the background 
cosmology; the techniques either fit directly to SN mag- 
nitudes or their luminosity distances Dl(z), or to more 
indirect quantities such as dark energy density, pde{z), 
the expansion history, H(z), and the equation of state 
w(z), or much more indirect quantities such as the po- 
tential of the DE scalar field(s) [3-13]. 

Unfortunately, many of these techniques are specific to 
luminosity distance measurements, or more generally to 



measures of the background expansion rate, making it 
difficult to include other types of data. The background 
measurements can be limited by our knowledge of other 
parameters, in particular the value of the dark energy 
density today, FIde [5, 8]. Thus, it is useful to include 
measurements which could be sensitive to the growth 
of structure; such measurements, including weak lens- 
ing, galaxy clustering and the integrated Sachs- Wolfe ef- 
fect (ISW), can help to break the degeneracies and can 
potentially be the basis for consistency tests of the DE 
paradigm [14, 15]. 

When comparing background and structure evolution 
data, it is useful to focus on more indirect parameteri- 
sations like the equation of state, w(z). Non-parametric 
approaches typically would expand w(z) in terms of bins 
or basis functions. A powerful tool is the principal com- 
ponent approach, which uses forecasts of the future ex- 
periments in the form of Fisher matrix projections to 
find linear combinations of these basis functions which 
are independent and well determined [13, 16-22]. 

One reconstruction approach is to make a truncation of 
the best constrained principal components, but the num- 
ber of modes kept is an open question. This often is de- 
cided without reference to the data, potentially missing 
evidence because it is not expected. This simple trun- 
cation can lead to significant biases and unrealistically 
small errors in the reconstruction where the data are poor 
or are absent entirely [16]. It is clear that some bias in 
reconstructions is inevitable, particularly when one at- 
tempts to reconstruct w(z) in regimes where the data 
are poor or are absent entirely. However, it is worth in- 
vestigating whether a better reconstruction method can 
be found. 

In a fully Bayesian approach, one can explicitly specify 
a prior on the behaviour of w(z), which then determines 
which modes are kept in the reconstruction, based on 
where the evidence from the data outweighs the prior. 
But the crucial question is then to understand the best 
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way to specify priors in order to minimise the bias, while 
at the same time reducing the impact of data noise in the 
reconstruction. Here we extend previous work, where we 
assumed that w(z) could be treated as a Gaussian ran- 
dom field with a proposed correlation function [17]; this 
correlation function effectively enforces a smoothness on 
the reconstruction. Our approach is close in spirit to re- 
cently proposed Gaussian Process (GP) method [23, 24], 
which applied a similar prior to reconstructions of w(z) 
using SN data, though there are important differences as 
described below. 

In this paper we discuss how to choose a prior and im- 
plement it in a computationally efficient method to recon- 
struct w(z) non-parametrically from the observed data. 
As we will show, it is very accurate (with relative error < 
10% for a range of dark energy models), computationally 
cheap and fast, straightforward to implement, and can 
be used to analyse any kind of cosmological data. Af- 
ter discussing general issues regarding the choice of prior 
correlation functions, we test our reconstruction accu- 
racy using the simulated mock data for Planck [27] and 
a future space-based mission. We then conclude with a 
discussion of how to extend this method to other param- 
eterization and how to calculate more realistic priors. 

II. RECONSTRUCTION METHODS 

Reconstruction methods in general have been well ex- 
plored; here we focus on a Bayesian method, so to make 
our assumptions as explicit as possible. Whatever the 
method, one needs some metric to evaluate the quality 
of the reconstructions. One natural choice of metric, or 
risk function, is the mean squared error (MSE) of the 
binned equation of state, between the true model and 
the reconstructed one: 

MSE = J2( w T Ue ~ < ocon ) 2 - (1) 

i 

The MSE is not the only possible choice, but it is common 
and we adopt it here. 

Our Bayesian method must assume an a priori prob- 
ability distribution in the space of models. In the ab- 
sence of data, the reconstruction will return some fidu- 
cial model, w fid , which is the peak of the prior distri- 
bution; if the fiducial model differs from the true one, 
the mean reconstructed model will be biased, where 
w m ™ = (w rocon ) is averaged over the ensemble of pos- 
sible data consistent with the true model. The ensemble 
average of the MSE can be shown to be the sum of two 
terms: the variance with which the mean model will be 
reconstructed and the bias between the true model and 
the mean one: 

(MSE) = ^((< lcan -< ocon ) 2 ) + (u.f uc -< lcan ) 2 . (2) 

i 

The challenge of reconstruction is to keep both types of 
errors small; a stronger prior will reduce the variance 



of the reconstruction, but will increase the bias if the 
fiducial and true models do not match. 

Here we focus on a fully Bayesian reconstruction 
method. In it, wc simply assume a prior probability dis- 
tribution on the function we wish to reconstruct, and our 
reconstructed function is the maximum of the posterior 
probability distribution, which is the product of the data 
likelihood and the prior. Thus, the reconstruction method 
is entirely specified by the definition of the prior and this 
makes our reconstruction assumptions fully explicit. 

We try to choose the prior to minimise the bias; how- 
ever, this is intrinsically subjective because it depends 
on the expected 'true' model. In principle, this should be 
determined by theoretical considerations and/or previ- 
ous observations. Given that we are making projections 
based on the most optimal future data, previous observa- 
tions are not expected to have significant impact, making 
theoretical considerations key. Theoretical priors can be 
made by considering ensembles of possible models, either 
using a combination of parametric models for the equa- 
tion of state, or by using physical models; e.g., by putting 
a prior on the space of possible quintessence potentials. 
However, it is clear that this choice could vary wildly 
between theorists. 

One of the problems of using a non-parametric models 
is the sensitivity of answers to the binning scheme as- 
sumed. If few bins are assumed, then the answers are well 
determined, but the resulting reconstruction has unphys- 
ical discrete structures resulting from the binning. But if 
many bins are assumed, some degrees of freedom (d.o.f.) 
will not be constrained by the data, resulting in large 
variance and slow convergence of Monte Carlo Markov 
chain (MCMC) methods. An advantage of the Bayesian 
approach is that we can use many bins, but then con- 
strain the residual degrees of freedom by imposing a prior 
on the space of functions. This prior tends to make the 
functions smoother, reflecting our preconceptions of how 
the equation of state should evolve. 

Another common approach to reconstructions is to 
simply smooth the data, e.g. by using some implementa- 
tion of a low-pass filter, on the assumption that the mod- 
els will not have high frequency variations. This equates 
to an infinitely strong prior, which does not allow high 
frequency modes in the reconstruction, no matter how 
strong the evidence for them may be. While this is possi- 
ble to implement with the methods we describe, it seems 
more reasonable to set some strong but finite prior based 
on theoretical considerations; that way, high frequency 
modes could, in principle, enter the reconstruction if 
the evidence for them becomes sufficiently strong (where 
the definition of sufficient is based purely on theoretical 
grounds.) 

Here we focus on phenomenological choices for the 
prior, and make recommendations based on simple con- 
siderations, but allow that this will be up to individual 
choice. 
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III. THE CORRELATED PRIOR 
A. The correlation function 

The prior distribution could be an arbitrarily com- 
plex multi-variate probability distribution for the non- 
parametric amplitudes. We assume this distribution is 
Gaussian, meaning it is specified by a covariance matrix 
describing fluctuations around a mean or fiducial model, 
Wfid- Here for simplicity, we further assume that the co- 
variance matrix is a function only of the scale factor and 
in particular that it is translation independent, depend- 
ing only on the difference in scale factor \a — a'\. 

These assumptions allow us to specify the prior using 
a simple one-dimensional function, defined in [17] as 

U\a - a'\) = (lw(a) - w Rd (a)][w(a') - w Rd (a')]) , (3) 

which can be used to generate the likelihood for any 
w(a). This has the effect of reducing the degrees of free- 
dom of the function, effectively binding together neigh- 
boring bins below some specified correlation length a c . 
This is precisely the kind of prior assumed in the Gaus- 
sian process approach [23, 24], though the shape and 
paramctrization of the correlation function can differ con- 
siderably. In Ref. [17], we applied this smoothness prior 
to the Fisher forecasts, and found it a natural way to 
quantify how much we could learn from future data. 

Here wc choose the scale factor, a, as our independent 
variable, which is somewhat arbitrary and in our ear- 
lier work [17] instead used rcdshift. A function which is 
translation invariant in one variable will not be precisely 
translation invariant in another, meaning the priors can- 
not be equivalent. However, if our answers strongly de- 
pend on such differences, then likely the priors have been 
made too strong. 

B. Implementation of the correlation prior 

We start by discrctising w(a) at a > a m ; n using bins 
uniform in scale factor a, and use a wide bin for a < a m j n . 
We have checked that using uniform bins in ln(a) or z re- 
turns a consistent result when the number of bins N is 
large enough, namely, N > 20 with a m - m = 0.4, which we 
adopt in this work. Note that we use tanh bins rather 
than tophat bins so that the time derivative of w is sta- 
ble at any a, which is an important issue for calculating 
the dark energy perturbations when using CMB and LSS 
data. 

Given some correlation shape and choice of fiducial 
model, it is straight forward to discretize them given our 
choice of binning. Let us assume the i th bin is from ai 
to ai + A, and for simplicity we will assume that all bins 
have the same width A = a.; + i — a.;. The equation of state 
averaged over each bin is given by 

^ rai+A 

w.i = — / daw (a). (4) 



We can write the variation from the true model averaged 
over the bin as, 8wi = u>' rue — wf d . Calculating the co- 
variance matrix of the binned equation of state is then 
straightforward: 

^ rdi+A i-aj+A 

Cij = (SwiSwj) = I da I da'£ w (\a - a'\). 

(5) 

Given this, the prior for the model can be written as 

TVior(w) oc e -(w-w"rc-*(w-w**)/a_ (6) 

Subsequently, in the MCMC, we minimize the total pos- 
terior x 2 defined as 

2 _ 2 i 2 (7\ 

where 

xl liol = -21n7Vior = (w - w fid ) T C" 1 (w - w fld ). (8) 

In Sec. IV we show that one can marginalize over pos- 
sible choices of w fid , or define a procedure by which w fid 
is defined using an averaging of w. In such a case, the 
prior probability can be written as 

P(W) OC e -w T C- 1 w/2 ; (9) 

where C is a modified version of the correlation matrix, 
meaning that 

4 rior = w^C^w . (10) 

The correlation function typically provides a frequency 
dependent prior: high frequency oscillations are sup- 
pressed by the prior, while the low frequency modes are 
largely unaffected and are so dependent on the data. Pro- 
viding a prior stabilizes the high frequency variances and 
allows us to focus on the more interesting low frequency 
modes. It also significantly improves the MCMC con- 
vergence, as the flat directions are now curtailed by the 
prior. Also, as long as there are sufficient bins compared 
to the correlation length, the prior largely wipes out de- 
pendence on the precise choice of binning. 

C. The Wiener filter 

To reconstruct w(a) from real data, we simply fold the 
theoretical prior into our MCMC search using Eq. (7) 
and search for the optimal model. One can gain analyti- 
cal insight into reconstructions using the correlated prior 
by exploiting similar arguments used for Wiener filtering, 
a well understood methodology for reconstructing a sig- 
nal in the presence of noise. The Wiener approach can be 
applied in this context once we recognise that the Fisher 
matrix (or its inverse) describes the expected noise co- 
variance in the w(a) reconstruction, while our theoretical 
prior describes our expected signal covariance. 
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We assume that we have some noisy data vector which 
is the best fit to the observed data, and we wish to esti- 
mate the best actual w(a) given the data has noise de- 
scribed by the Fisher matrix, Fij. The expected likeli- 
hood is given by 



P(w° b » CX e -(w° b =-w) r F(w-=-w)/2^ 



(11) 



Combining this with the theoretical prior defined above, 
we find the posterior distribution ^(wlw * 351 ). We define 
the reconstructed equation of state as that which maxi- 
mizes the posterior; we find 



F(w 



rccon obs 
— W 



cr^w™ 



ficK 



0, (12) 



which has the solution, 



= F-^C + F' 1 ) 



— l„,fid 



C(C + F- 1 )- 1 ^ 



(13) 

Assuming that the noise dominates in the higher frequen- 
cies, this is effectively a low-pass filtering of the data 
combined with a high-pass filtering of the fiducial model. 
Given most choices of fiducial models will be smooth, the 
latter should have minimal effect. 

As mentioned above and detailed in Sec. IV, it is pos- 
sible to eliminate the explicit dependence on w fid by in- 
troducing a modified correlation matrix C with the prior 
probability given by Eq. (9). Then the maximum poste- 
rior solution becomes simply 



4 w obs . 



(14) 



This analytic solution is useful for projections, which 
we present in Sec. VB2. When working with real data, 
it is very difficult to estimate w obs , which optimises the 
observations alone, because flat directions in parameter 
space make the MCMC convergence impossible when the 
number of bins is large. But when searching for w recon , 
such flat directions are curtailed by the prior. 



IV. EXPLORING SPECIFIC CORRELATED 
PRIORS 

A. The choice of correlation function 

The choice of amplitude and shape of the prior can 
critically effect the reconstructed w(a) and other con- 
clusions, such as the number of dark energy parameters 
which can be constrained. The prior should incorporate 
both our theoretical prejudices and other earlier cosmo- 
logical data which have not been explicitly included in the 
present analysis. Theoretically, one might examine the 
equation of state dynamics of the quintessence field, be- 
ginning with some measure on the space of quintessence 
potentials (e.g. [28], also see [29].) Earlier data can also be 
used to decide on the prior. However, it is not clear how 
useful this is, as the earlier data will naturally be weaker 
than the data being considered, so it would be surprising 



if it resulted in stronger constraints on any degree of free- 
dom. If the experiment provides independent constraints, 
it should be incorporated explicitly rather than using a 
heuristic prior model which can misrepresent the distri- 
bution of the information. Here we attempt a more prag- 
matic approach, by using simulated data and attempting 
to reconstruct a number of 'typical' dark energy models. 

One choice is to assume that the prior is diagonal in 
the binning, effectively assuming the correlation is a delta 
function [20]. If all bins have the same prior, the correla- 
tion matrix is proportional to the identity matrix, which 
has the advantage that the principal components of data 
remain unchanged when the prior is included. However, 
such a prior lacks the bin independence and smoothing 
properties of a more realistic prior choice. 

In previous work [17], we worked with redshift z as the 
variable, and modeled the correlation function using a 
form £ w (6z) = £ w (0)/[l + (Sz/z c ) 2 ]. Here, we adopt the 
same form expressed in terms of the scale factor: 



(15) 



1 + {Sa/a c ) 2 ' 

where a c describes the typical smoothing distance, and 
£w (0) which is a normalising factor that relates to the am- 
plitude of the expected variance of w(a). In what follows, 
we will refer to this model as CPZ. The normalisation can 
also be chosen by specifying the allowed variance of the 
average equation of state, 



2 _ 



da da' £ w (a — a') 7r£(0)a c 



(1 - a min ) 5 



1 - a„ 



(16) 



The precise shape assumed was somewhat arbitrary, cho- 
sen for simplicity and ease of use; while the correlation 
is expected to decrease, the decline could be steeper a 
steeper power, or exponential rather than a power law. 
The Gaussian process [23, 24] work instead typically as- 
sumes an exponential fall off, with 'hyper-parameters' 
determining the shape; some distribution is assumed for 
these hyper-parameters and then they are marginalised 
over. 



B. Comparing different correlation shapes 

We compare a few different shapes for the correlation 
functions, to see the impact of their qualitative features. 
In addition to the form discussed above, £cpz(<5a) = 
£ w (0)/[l + (5a/a c ) 2 }, we also consider an exponential fall 
off, £ex P (<5a) = £u)(0)e _<5a ' 00 , and a general power law 
form, £ P ow(<5a) = (8a/a c )~ n . We normalise each to the 
same mean variance, a 2 ^ and set a c = 0.06, apart from 
the power law form where specifying the variance fixes 
the value of a c . We then compare the discretised corre- 
lation matrices over the range a = [0.4, 1.] The different 
shapes are compared in Figure 1. 

We first consider the behaviour of the functions at 
small separations, where it can either approach a con- 
stant or diverge for the power law models. If the slope 
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FIG. 1. Different prior shapes normalised to the same mean 
variance and a c = 0.06: CPZ (black, solid), exponential (red, 
dashed), power law (blue, dotted). 



FIG. 3. Different prior eigenvalues normalised to the same 
mean variance: CPZ (black, circles), exponential (red, 
squares), power law (blue, diamonds). The normalisation re- 
sults in the same lowest eigenvalue, and modes increase in 
frequency. 




scale factor 



FIG. 2. Some of the normalised eigenvalues of the prior. 
Changes to the prior description have little effect on the 
modes themselves. The modes are largely ordered in fre- 
quency (black, red, green, blue), with the lowest frequen- 
cies the least constrained by the prior. The highest frequency 
mode (dashed) is the most constrained. The deviation from 
the simple Fourier transform is due to the boundaries on w. 



is too steep, the discretiscd diagonal correlation also di- 
verges unless a small scale cut-off is imposed. In the limit 
that this cut-off becomes small, the correlation matrix ef- 
fectively becomes diagonal which we consider as a sepa- 
rate case. Below, we focus on the power law case only for 
n = 1/2, a shape shallow enough to avoid the divergence. 

The impact of these different ways of defining priors 
is perhaps most clearly seen in the eigenvalues of the 
associated prior matrices. To illustrate this, we choose 



a binning (20 bins between a = 0.4 and a = 1.) and 
calculate the corresponding correlation matrices, and find 
the eigenmodes and their eigenvalues in this binning. 

Perhaps surprisingly, virtually all the ways of defining 
the prior result in similar prior eigenmodes, and in iden- 
tical ordering for the eigenvalues of these modes. A few of 
the modes are shown in Fig. 2, where it is clear that the 
modes correspond closely with the Fourier basis, apart 
from minor differences which arise from the scale factor 
boundaries. Because correlations fall off with distance, 
the most constrained modes are those with the highest 
frequencies. Slowly varying modes are least constrained 
by the prior. (In the limit of a diagonal correlation ma- 
trix, the eigenvalues are degenerate and the mode defini- 
tions are arbitrary.) 

However, the different correlation shapes significantly 
change the associated eigenvalues, as can be seen in Fig. 
3. Because the models are normalised to the same mean 
variance, the lowest eigenvalues are the same. However, 
the spacing of the higher values is significantly different. 
For the CPZ model, the eigenvalues are nearly equally 
spaced in log, and are steepest of these three models for a 
given a c , leading to the most constrained high frequency 
modes. This spacing directly depends on a c : the longer 
the correlation distance, the steeper the spectrum. In the 
limit of small o c , the matrix becomes diagonal. Because 
of its relatively simple behaviour and transparent depen- 
dence on its parameters, we adopt the CPZ form as our 
default model below. 



C. Changing the fiducial model 
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The prior distribution function, as defined in (6), de- 
pends explicitly on the choice of the fiducial equation of 
state ui fid (a). For simplicity, we could assume the fidu- 
cial model to be w = — 1, but arguably this is ad hoc. 
Formally the priors should be chosen from fundamental 
physics; a Gaussian prior centred at w = — 1 is unphysi- 
cal in many simple models that do not allow the so called 
'phantom' [30] region, w < — 1, which violates the null en- 
ergy condition. Dynamical models like quintessence may 
asymptote to cosmological constant behaviour at early 
or late times (thawing or freezing models), but are al- 
ways constrained to have w > — 1. While phantom mod- 
els are possible, models which cross the 'phantom divide' 
(dubbed 'quintom' models [31]) can also be a struggle to 
build in a natural way. In our present phenomenological 
approach, it makes more sense to choose a prior which 
does not favour a particular value. Since we wish to test 
ACDM, we prefer to use a prior which does not favor 
w = —1 over another model, e.g., w = —0.9. 

One way to eliminate the dependence on u> fid (a) is 
to marginalise over it. For a constant ui fid this can be 
done analytically; we can write the fiducial model as 
w fid = w fid u, where u = (1,1,1,...) and we assume a 
weak Gaussian prior on u; fid with variance given by cr^ d . 
Marginalising over w fid has the effect of changing the ef- 
fective correlation matrix: 

TVior OC J rf w fid e -(w-t U f ' d u)C- 1 (w-«, f ' d u)/2-(t U fid ) 2 /2<T f 2 ld 

oc ; (17) 

where 



' ' l l ^ ' % 1 —2, sy—1 



Such expressions are a common result when marginalis- 
ing over nuisance parameters (e.g. [32]) and can be sim- 
ply inverted using the Sherman-Morrison formula [33] , a 
special case of the Woodbury formula [34], to find: 



Cij — Cij + cxg^iiiUj. 



(19) 



Here we see the utility of introducing a prior on the fidu- 
cial value, however weak - a flat prior corresponds to 
an infinite variance, which makes C undefined. Most im- 
portantly, marginalising assigns large noise to the flat 
mode in the prior and ensures that for reconstructions 
this mode is determined by the data alone. 

An alternative approach is to define the fiducial model 
to be a function of the assumed equation of state 
(w (w)) rather than being fixed; for example, it could 
float to a constant value given by the average of the equa- 
tion of state, 



..fid 



1 



a f 



a f i 

daw(a) = —T-iWi. (20) 
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FIG. 4. Impact on the eigenspectrum of various transforms 
considered. The original CPZ spectrum is in black (solid cir- 
cles). Marginalising over the mean only zeroes out the first 
eigenvalue. Using a running average tends to reduce the low- 
est frequency modes while leaving the higher frequency con- 
straints unchanged. (11 bin average - open circles, 5 bin av- 
erage - open squares, 3 bin average - open diamonds.) Taking 
the CPZ prior form for the derivative results in the green 
triangles; the spectrum steepens and the first mode is zeroed. 



Such a change, where the fiducial model is a linear trans- 
formation of the underlying model, w = Sxv is also 
equivalent to simply changing the effective correlation 
matrix. The explicit dependence of the prior on the fidu- 
cial model is absorbed into a modified correlation matrix, 



c; 



(5 ik - S lk )C kl (dij - Sij). 



(21) 



In the case above where the fiducial model is the average 
over all bins, we have Sij = UiUj/N, implying 



(18) <V = C ij 1 --(u i u k C kj 1 +C ik 1 u kUj )+^u i u j u k C k[ 1 



N 2 



'W u i- 
(22) 

This is similar to the result for marginalisation and, like 
that case, can result in a degenerate direction unless an 
additional prior is assumed; without this, the matrix can- 
not be inverted. 

The implementation of a 'floating' fiducial model can 
be based on other smoothing schemes. We might instead 
define the fiducial model as a more local average of the 
true model, which reduces the prior penalty for the longer 
wavelength modes, making them more responsive to the 
data. For example, we could have 



,.fid 



I ci j —a-i | <a c 



(23) 



where Nj denotes the number of the neighboring w bins 
around the ith bin Wi within the 'correlation' radius a c . 
We tried other smoothing methods, such as a Gaussian 
kernel scheme, and found very similar results. 

These modifications to the fiducial model significantly 
affect the eigenvalues, particularly the one for the con- 
stant w mode, which is effectively reset to zero (or the 
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prior that it has been explicitly given.) Both approaches, 
marginalising over the mean value or redefining the fidu- 
cial model to the global average, do this while leaving 
the other eigenvalues unchanged, just as expected. Using 
a more local definition of the fiducial model also reduces 
the constraints on other long wavelength modes while 
leaving the shorter ones mostly unchanged, as can be 
seen in Fig. 4. We adopt this local averaging method of 
Eq. (23) when performing the reconstructions in Sec. V 



D. Correlations of w(a) derivatives 

Another way to ensure that the average of the fiducial 
model does not impact the reconstruction is to apply a 
prior instead to the derivative of the w(a) function. Es- 
sentially, we specify a correlation matrix for the difference 
of the equation of state between bins, 



((W l - W i+1 )(Wj - w j+1 )) = A. 



(24) 



This has one less dimension than the correlation of the w 
bins itself. Again, we can define a Gaussian prior based 
on this correlation matrix. 



■p(w) oc e~('" , ~ t0!+1 ) T£>i J 1 ( t0i_ "' 3+1 ^ 2 . 



(25) 



This can be thought of as a prior on the bin values them- 
selves with an alternative inverse correlation defined by, 



Cl 



Dl 



D 



i-h3 



D 



D 



(26) 



The indices of C span [1,AT], while those for D span 
[1,N ~ 1]; where an index for D exceeds its range (i.e. 
i = 0, N), the inverse matrix is taken to be zero. 

While C^j 1 is well defined in this way, it is not invertible 
because the constant w mode is not constrained in any 
way. If required, this can be combined with a weak prior 
on the constant w mode to yield a correlation matrix 
which is invertible. 

Assuming the fiducial CPZ correlation for the deriva- 
tive of w (rather than for w itself) results in a new spec- 
trum of eigenvalues. As expected, the constraint on the 
mean mode is reset to zero. However, it also results in a 
steeper spectrum than the original one, as can be seen in 
Fig. 4. This is roughly equivalent to using a smaller cor- 
relation length in the original prescription, and so would 
not qualitative change the analysis below. 



RECONSTRUCTING THE EQUATION OF 
STATE 



Next we combine the priors we have defined with the 
data to see how effective they will be in reconstructing 
w(a). We do this in two ways: first, using calculations 
of the Fisher matrices for a set of observations to test 
Wiener reconstructions of some sample w(a) functions, 
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FIG. 5. Eigenvalues from the data (red circles) and prior in- 
dependently (blue squares) and combined (black stars). The 
modes are arranged roughly from low frequency to high fre- 
quency, though this correspondence is not exact when the 
data are included. The low frequency modes are constrained 
by the data, while the high frequency modes are constrained 
by the prior. 



effectively assuming a simple Gaussian approximation to 
the likelihood. Second, we create more realistic simula- 
tions of the expected CMB, SN and H{z) data and re- 
construct w(a) by finding the maximum posterior model 
using MCMC techniques. 



A. Data assumptions 

We test our reconstruction algorithms by combining 
the CMB distance prior [35] from Planck [27] with su- 
pernovae (SN) and galaxy-galaxy correlation measure- 
ments potentially possible for a future space-based dark 
energy mission. Note that our assumptions are close to 
those that could be achieved from a next generation 
survey such as Euclid, but do not exactly match those 
now expected for the default Euclid mission [25, 26]. For 
the CMB, we use a Fisher matrix forecast based on the 
Planck data. For the SN data, we assume 4000 SN dis- 
tributed in 14 redshift bins from z — 0.15 to z = 1.55 
for a deep SN survey [36]. We also include 300 low-z SN 
from the Nearby Supernova Factory [37] to improve the 
constraint and assume a Gaussian noise with a variance 
(7 = 0.13 for all the data points. 

The measurement of the expansion rate at different 
redshifts H(z) can tighten constraints on w(a), since 
the DE energy density is directly related to the Hub- 
ble rate at a given redshift (e.g., [38]). One of the most 
promising ways of estimating H{z) from future spectro- 
scopic surveys is to observe the Alcock-Paczynski effect 
in the power spectrum or correlation function of galax- 
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ics [39], exploiting features such as the baryon acoustic 
oscillations (BAO) and the power spectrum peak. The 
BAO feature in the correlation function is at a fixed 
comoving scale which is determined by the physics at 
recombination; measuring the rcdshift difference of this 
feature in the radial correlation function provides a di- 
rect local measurement of H(z). We divide the galaxies 
(0.35 < z < 2.5) into bins of width Az = 0.1 and predict 
the measurement errors cfh(z) m each redshift bin, using 
the Fisher matrix technique of [40]. We assume that the 
spaced-based survey would cover 20,000 deg 2 surveying 
area and will detect emission line galaxies with the red- 
shift precision of a z = 0.001(1 + z). For our projections, 
we adopt the values derived in [41] for the number density 
of emission line galaxies and the bias model of [42]. 

Given these data models, we consider the following set 
of cosmological parameters, 



P = (u b ,uj c ,Q s ,T,n s ,A s ,w 1 ,...,W2o,Ar), 



(27) 



where u>b and uj c are the physical baryon and cold dark 
matter densities relative to the critical density respec- 
tively, S is the ratio of the sound horizon to the angular 
diameter distance at decoupling, r denotes the optical 
depth to rcionization, n s and A s are the primordial power 
spectral index and amplitude respectively and wi, W20 
denote the 20 w bins uniform in a. We also include and 
marginalize over one nuisance parameter J\f for super- 
novae, which accounts for the calibration uncertainty in 
measuring the supernova intrinsic luminosity. The details 
of the Fisher forecast and the related survey parame- 
ters can be found in [43] and [44]. After marginalising 
over the cosmological and nuisance parameters, the pro- 
jected constraint on a constant equation of state model 
is (7,7, = 0.015. 



B. Reconstruction Forecasts 

A Fisher matrix analysis provides a means of project- 
ing the constraints possible for the experiments, under 
the optimistic assumption that the resulting likelihood is 
Gaussian (Eq. 11.) Combined with our assumed Gaussian 
prior, we can analytically forecast the expected recon- 
structions when assuming a particular underlying w(a) 
using the Wiener filter approach described in Section 
IIIC. This approach also allows us to quantify the ex- 
pected bias and variance for a given input model. 



"I 1 1 1 1 1 1 1 

n-th best data mode 
- corresponding data+prior mode 
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FIG. 6. The first four best constrained eigenmodes of the data 
covariance matrix (solid lines), along with the corresponding 
modes of the data+prior covariance (dashed lines). As seen in 
Fig. 5, the data eigenvalues become comparable to the prior 
eigenvalues at the n = 4 mode, and the n > 4 modes are 
completely determined by the prior. 



high frequencies, and the most constrained modes of each 
survive in the combined matrix. However, this is only ap- 
proximate since the data and prior have different eigen- 
modes; in particular, the data modes are more weighted 
to the lower redshifts. 

Fig. 5 shows the impact of combining the data and 
prior on the eigenvalues. The cross over point is crucial, 
as it determines the reconstruction bias, while the slope 
of the prior eigenvalues will determine the variance of the 
reconstructions. With the data described above and our 
fiducial prior choice, only 4 data modes survive to play 
a role in the reconstructions. The surviving modes are 
shown in Fig. 6. One can see that, while the shapes are 
distorted by the prior, their basic features remain in tact. 



1. Combined eigenmodes 

When combining data with the prior, the effective re- 
sulting Fisher matrix changes from F — > F + C _1 . Many 
of the general aspects of reconstructions can be under- 
stood by considering the eigenvalues of this combined 
matrix. Roughly speaking, the data constrain the low 
frequency modes the best, while the prior constrains the 



More modes would survive if the data were better, or if 
the prior were weaker (e.g. reducing the prior normalisa- 
tion On, or the slope a c .) The benefit of a weaker prior is 
that more modes would be used to reconstruct the data, 
giving smaller potential bias. The cost would be signifi- 
cantly more variance in the reconstructions. The key is- 
sue is whether the true underlying w(a) will require more 
modes to achieve a small bias. 
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w = - 1 w(a) = -1.2+0.8 (1-a) w(a) = -l-21na exp[-ln 2 a/0.4 2 ] w(a) = -l+0.2tanh[1n(a/Q.75)/0. 1 ] 
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FIG. 7. Wiener reconstructions of the various test functions, based on the Fisher projections for a joint data set of SN + H(z) 
+ CMB distance prior (Planck), and using our fiducial prior assumptions. 



2. Wiener filter projections 

If the Gaussian approximation to the likelihoods are 
reasonable, then the Fisher matrix should allow a good 
approximation to the expected errors on the measured Wi 
bins. We can make realisations of the observed binned 
Wi data by assuming some true underlying model and 
adding to it random noise derived using the principal 
components of the Fisher matrix. Each principal compo- 
nent is assigned a random Gaussian amplitude with the 
appropriate variance, and these are summed to get the 
total noise in the binned data. These realisations are then 
Wiener smoothed using Eq. (14) to get the reconstructed 
w(a). (Note that when C makes negligible constraint on 
the average, then the term F _1 (C + F~ 1 ) _1 w can be 
neglected if the fiducial model itself is constant, making 
Eqns. (13) and (14) equivalent.) 

With these assumptions, we can derive analytically 
the mean reconstructed models, as well as the variance 
and bias expected. For a given w tme , the average recon- 
structed model is 

w mcan = + p-lj-l^riM ( 2 8) 

This implies that that bias contribution to the MSE is 
^K true -uC oan ) 2 = IIF-^C + F- 1 )-^*" 10 !! 2 . (29) 

i 

We can similarly predict the variance contribution under 
these assumptions to find, 

^(« lcan - < con ) 2 ) = Tr[(F + C- X )- 2 F]. (30) 

i 

Note that in this approximation, the variance is inde- 
pendent of the assumed model. We have confirmed these 
analytic results using 20,000 realisations for each input 
model. 

We consider four basic underlying phcnomcnological 
models (w truc ) to test the ability of our algorithm to 



capture different kinds of evolving features in w{a): 

Wconst = - 1-0 

w lin = - 1.2 + 0.8(1 - a) 
Wfcat = - 1 + 21na • exp(-ln 2 a/0.4 2 ) 
wtrans = - 1 + 0.2 tanh(ln(a/a t )/A) . (31) 

We refer to these as the constant w model, the linear 
model [45], the feature model (see discussions in [46]) 
and the transition model (e.g. [47]), respectively. For this 
section, we also consider a fifth 'thawing' model based 
on the paramctrization of [29] for a thawing quintessence 
which smoothly goes from w = —1 at high redshifts to 
w = —0.8 today. The transition model changes from w = 
— 1.2 at high redshifts to w = —0.8 at low redshifts, and 
we chose the transition to occur at at = 0.75 with width 
of A = 0.1. 

These functions and their reconstructions are shown 
in Fig. 7. For these reconstructions wc use our fiducial 
prior, which we define as the CPZ prior (cr^ = 0.02, a c = 
0.06) with a fiducial model as the average of bins within 
Aa = 0.06 given by Eq. (23) (effectively the five bin 
average shown in Fig. 4.) Note that primarily is used 
to normalise the high frequency priors; in fact, the local 
averaging means that effectively no prior is placed on w. 

One can see that on average the reconstructions are 
fairly unbiased, with practically no bias in the case of 
a constant w. This is expected, as the smoother func- 
tions are well represented by the modes which survive 
the prior. Models with stronger evolution, such as the 
feature model and the fast transition model, have the 
most bias. However, even for these, the average MSE is 
dominated by the variance for this choice of prior. 

The variance can be decreased by strengthening the 
prior, but this effectively allows fewer data modes to sur- 
vive and so increases the potential bias of the reconstruc- 
tions. Thus the MSE cannot be dramatically reduced ex- 
cept in those cases where the bias is intrinsically small. 
For example, the constant w modes are unbiased in this 
prescription, and so the variance can be made very small. 

The MSE, and its contributions from bias and vari- 
ance for each input model are shown in Table I. One can 
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these we then perform a blind MCMC reconstruction, 
adding to the data Xdata a similar term for the prior 
Xp r j or as described in Sec MB. 

To generate the simulated data, we set the fiducial val- 
ues of other cosmological parameters to values favoured 
by the WMAP7 observations [35], i. e. Cl m = 0.263 and 
Hq = 71 km/s/Mpc. We then simulate the SN luminosity 
data by randomly generating the SN redshifts and errors, 
while respecting their expected rcdshift density. The sim- 
ulated SN distance modulus fi of four phenomenological 
dark energy models (offset by that of the ACDM model) 
are shown in Fig. 10. We similarly randomly generate 
H(z) measurements in the redshift bins described above 
consistent with their expected errors. 



FIG. 8. Amplitudes of the various test functions expanded 
in the prior eigenmodes. Those functions which are best re- 
constructed are those dominated by the low mode numbers. 
Those with higher frequency features are reconstructed with 
more bias. 



Model 


MSE 


Bias 


Variance 


w = -1 


0.064 




0.064 


Linear 


0.068 


0.004 


0.064 


Feature 


0.081 


0.017 


0.064 


Transition (A = 0.1) 


0.071 


0.007 


0.064 


Transition (A = 0.05) 


0.095 


0.031 


0.064 


Thawing 


0.065 


0.001 


0.064 



TABLE I. A comparison of MSE results for different assumed 
models, broken down into bias and variance contributions. 



see that the MSE figures are dominated by the variance, 
with the bias term being comparable in the transition 
model when the change is very quick. This is intentional; 
a stronger prior does not significantly reduce the MSE 
for most of the models we assume (the exception being 
the constant model) , and given the choice it seems better 
to have a slightly noisier reconstruction than one which 
could be significantly biased. 

The relative biases can be easily understood by look- 
ing at the input models expanded in the eigenmodes of 
the prior, as shown in Fig. 8. We see that the models 
which are reconstructed with least bias (constant, thaw- 
ing model, slow transition model) have little support be- 
yond the fifth prior mode, while the others have some 
higher frequency components which are wiped out by the 
prior. 



C. Reconstructions from simulated data 

We next explore how our reconstruction method works 
for more realistic data, because Fisher forecasts tend to 
be optimistic and ignore potential non-Gaussianities in 
the likelihood. Here we look at single realisations of the 
potential data sets based on the models in Eq. (31). For 



For the CMB data, we generate realisations for the 
three CMB distance parameters [35] consistent with their 
expected covariances expected from Planck. Note that 
for the real data analysis, one should use the full CMB 
spectrum data, which is also sensitive to dark energy 
through the integrated Sachs- Wolfe effect. For such cal- 
culations, one must include dark energy perturbations, 
otherwise the derived constraints on cosmological pa- 
rameters will be biased [48-50]. Furthermore, one should 
treat dark energy perturbations self-consistently when w 
crosses w = —1 [51, 52]. One must be careful even with 
the CMB distance prior, which can be biased if the model 
is significantly different from the ACDM for which it has 
been derived [53]. 

We simultaneously fit the 20 bins with il m and Hq to 
our combined mock dataset using a modified version of 
CAMB/CosmoMC [54], with the smoothness prior im- 
plemented. In Fig. 11 we show the reconstructed w(a) 
from a joint mock dataset of SN + H(z) +CMB distance 
prior (Planck). As shown, our algorithm can successfully 
capture all the DE models we hide in the data, and the 
reconstruction is very accurate - the absolute value of the 
relative difference < 10% in all cases. The calculation of 
the prior Xprior adds virtually no overhead to the likeli- 
hood calculations, making the method quite efficient. 

It is important to emphasise that Figs. 7 and 11, while 
apparently very similar, are actually showing very differ- 
ent things. Fig. 7 shows the average reconstructed w(a) 
over the ensemble of data realisations consistent with the 
experiments, and the errors show the distribution of the 
reconstructed w(a) for these different data realisations. 
Fig. 11 instead shows the reconstructed w(a) for a single 
simulated data set (similar to what we expect to get from 
real observations) and its error bars represent the uncer- 
tainty in the determination of w(a) for this realisation. 
This best fit differs from the input model both because 
of the average bias seen in Fig. 7 and because of the ran- 
domness of the realised data. However, the error regions 
are similar because they are related to very similar A% 2 
functions. 
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w = - 1 



w(a) = -1.2+0.8 (1-a) 




w(a) = -l-21naexp[-ln 2 a/0.4 2 ] w(a) = -l+0.2tanh[ln(a/0.75)/0.1] 
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FIG. 9. The simulated SN magnitudes for a future deep survey as described in [36] for four phenomenological dark energy 
models listed in Eq (31). The blue solid line shows the relative SN magnitude with respect to that of the ACDM model. The 
red dashed line shows Afi = to guide eyes. 
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FIG. 10. The simulated H(a) measurements for a future half-sky survey as described in [40] for four phenomenological dark 
energy models listed in Eq (31). Again, the blue solid line shows the difference in H(a) with respect to that of the ACDM 
model. The red dashed line shows AH (a) = to guide eyes. 



VI. DISCUSSION 

Our reconstructions are very good, but some bias re- 
mains; this primarily takes the form of smoothing out the 
more quickly varying w(a) behaviours. While this can be 
made smaller by weakening the prior, the cost would be 
to increase the variance in the reconstruction. The bias 
simply reflects our assumption that smoother w(a) mod- 
els are more likely. 

This is not an intrinsic limitation of the method, but 
reflects our choice to assume that w(a) is smooth. While 
we do not have to make this assumption, we do require 
some theoretical basis where smoothness (or a measure) 
can be assumed, such as the DE scalar field potential 
V{(j)). However, this is essentially the requirement that 
the given model is well enough posed to be falsifiablc. 
Any model description should allow for the calculation 
of the prior in some effective basis, such as bins of w(a). 

We could either perform the reconstruction directly us- 
ing the parameters where the behaviour is expected to be 
smooth, or we could reconstruct in a more phenomenolog- 
ical basis, like w(a), but use a prior in this basis derived 
assuming smooth behaviour in another basis. Direct re- 



construction of something like the scalar field potential is 
better in the sense that one keeps in contact with the the- 
oretical context in which the prior makes sense; however, 
it makes it harder to compare the conclusions arising 
from different assumptions. Alternatively, we can derive 
a prior on w(a), either arising from more fundamental 
physics or purely phenomenological models by calculat- 
ing the resulting w correlation matrices C by marginal- 
ising over whatever parameters appear in the description 
(eg- [28]). 

Inevitably, the simplifying assumptions we have made 
above for the form of these correlations will not hold 
when realistic models arc considered. In particular, we 
assumed that the prior is Gaussian, and that the corre- 
lations are a function of 8a and are translation indepen- 
dent. The latter is certainly not expected, for example 
'freezing' models of quintessence will have larger vari- 
ance at high redshifts, while 'thawing' models will have 
larger variance at low redshifts. Even translating the sim- 
ple linear parameterisation, w(a) = Wo + w a (l — a) with 
Gaussian priors on wq and w a , into a correlation matrix 
on w(a) turns out not to be translation invariant; the re- 
sulting variance is smallest at a = 1, and increases with 
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w = -l w(a) = -1.2+0.8 (1-a) w(a) = -l-21na cxp[-ln 2 a/0.4 2 ] w(a) = -l+0.2tanh[ln(a/0.75)/0.1] 




0.4 0.6 0.8 1.0 0.6 0.8 1.0 0.6 0.8 1.0 0.6 0.8 1.0 

scale factor a 



FIG. 11. The reconstructed w(a) from a joint mock datasets of SN + H(z) + CMB distance prior (Planck). We show the best 
fit model (red solid), the true model we put in (black dashed) with 68 and 95% CL. error (dark and light shaded bands) in the 
upper panels, and the absolute value of the relative difference in the lower panels. Unlike the Wiener results, these are based 
on a single realisation of the data. 



1-a. 

It is possible that including many different classes of 
models together, such as thawing and freezing models, 
might homogenise the correlations to some extent. Al- 
ternatively, one could hope that such deviations of the 
covariance from our simplifying assumptions will not be 
significant enough to greatly impact the reconstructions. 
However, this requires further exploration and we will 
investigate this in future work. 

VII. CONCLUSIONS 

Here we have outlined a Bayesian method for recon- 
structing w(a), which essentially involves multiplying the 
likelihood of a non-parametric description of w(a) by a 
suitably defined prior. This prior is defined in such a 
way as to make the reconstructions smooth and inde- 
pendent of the binning choice, eliminating flat directions 
in parameter space which are a generic problem for non- 
parametric approaches. 

With the right choice of prior, we can also eliminate the 
dependence on the fiducial model; reconstructions where 
the data are poor or non-existent are instead determined 
by the assumed smoothness of the function, tending to- 
wards the mean of the measured w(a). However, appli- 
cation of a prior always implies some bias; in this case, 
the bias is largest for models which are quickly changing 
in redshift or scale factor. 

The advantages of this approach are that it makes one's 



priors explicit and it is easy to implement; once one de- 
cides on the prior parameters, e.g. the correlation length 
and the strength of the prior, it is straightforward and 
fast to evaluate the prior likelihood for whatever choice 
of binning is used. This can be combined with the data 
in an MCMC analysis, searching for the highest posterior 
solution for the w bin amplitudes; indeed, by breaking the 
parameter degeneracies, the prior will speed up MCMC 
analyses. 

With the accumulating high-quality data of SN, CMB 
and LSS, our method is an ideal tool to help reveal the 
nature of dark energy in the near future. Here we have 
focused on the methodology only, but in future work we 
will apply this prior to present data. 
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